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Abstract 

The main result of this paper is a new exact algorithm computing the estimate given by 
the Least Trimmed Squares (LTS). The algorithm works under very weak assumptions. To 
prove that, we study the respective objective function using basic techniques of analysis and 
linear algebra. 



1 Introduction 

In general, (linear) regression analysis is concerned with problems of the following type. 
One random variable Y called response variable is supposed to fit linear regression model 
Y = x T j3° + e, where x £ W is a vectoiQof explanatory variables (random or not), 0° G W p 
is a vector of regression coefficients and e is an error term. The aim of regression analysis 
is to estimate j3° having n measurements of Y and x. These measurements will be denoted 
as vector Y = (Yi, . . . , Y n ) and as a design matrix 



( x\ x\ ... x\\ 



X = 



,.I J 



(1) 



\ ■ ■ ■ / 



vector Xi stands for a transposition of i-th row of the matrix X. 

The best known estimate of j3° is the estimate given by the (ordinary) least squares 
method (OLS estimate) 

poLs, n ) = ( x T xr 1 x T r, (2) 

which is in fact the projection of Y into the linear envelope of the columns of X. Unfortu- 
nately, the OLS estimate was shown to be very sensitive with respect to data contamination 
of many kinds (for more see [S]). Therefore, other estimates which are less sensitive or, in 
other words, more robust were introduced. One of such estimates is the estimate given by 
the Least Trimmed Squares method (LTS estimate) proposed by Rousseeuw in 1984 [4 . 

OLS estimate ^ is actually obtained as a minimum of the OLS objective function 
(OLS-OF) defined as a sum of squares of residuals ri(/3) = Yi — xf/3, i.e., the OLS-OF reads 

n 

OF (OLS,x.Y ){p) = J2( Yi - xf(3) 2 . (3) 

i=l 

The basic idea of the LTS method is that the contaminating data points lay out of the main 
bulk of data and hence their residuals are bigger. It means that in order to obtain a more 
robust estimate of regression coefficients we ignore (trim) some portion of data points with 



All vectors in this text are treated as column vectors. 



biggest residuals. Formally, the LTS estimate is defined as a minimum of the LTS objective 
function (LTS-OF) 

0^™^)=Erf i) (/3), (4) 

i=l 

where ft is a parameter which determines how many (n — h) data points is to be trimmed 
and r(j) (/3) stands for the z-th smallest residuum at /3. Since it is not reasonable to ignore 
more than a half of data points, h usually takes values between n/2 and n. 

1.1 Algorithms 

As we will see in the following section, there exists a straightforward algorithm always giving 
the exact value of the LTS estimate, but it requires (^) computations of OLS estimates for 
h not trimmed data points. As this algorithm (and its modifications, see pQ) has been the 
only known exact algorithm, another faster ways how to obtain the LTS estimates were 
introduced. All these faster algorithms are probabilistic, i.e., it is not sure they return the 
exact value of the LTS estimate. There exist two kinds of probabilistic algorithms which may 
be described, using terminology from [2], as algorithms finding /3 satisfying the weak and 
strong necessary condition respectively. In fact, /3 satisfies the weak necessary condition if, 
and only if, it is a local minimum of the LTS-OF. Algorithms finding /3's satisfying the weak 
conditions have been proposed independently several times, first such algorithm is from [7J, 
its modification by the same author can be found in [5] , another algorithm of this type was 
introduced along with the notion of weak necessary condition in [2J , and a version for large 
data sets is described in [6]. In the case of the strong condition the situation is simple as 
there is only one representative: Feasible Solution Algorithm [3]. 

Since we are going to study an algorithm solving the problem of minimizing of the 
LTS-OF, we can forget the complex statistical background and formulate it as follows: 

Problem 1. Find the LTS estimate 

h h 

p(LTS,nM) = argmin yVf (/3) = argmin Vfo - (3xJ) 2 , (5) 
/3eR? ~l pew ~{ 

where n > p > 1. Y = (yi, . . . , y n ) T , X — (xi, . . . , x n ) T is a matrix from M. n,p , and h is an 
integer such that p < h < n. 

Further, let us denote the data for which the problem is defined by 

V = {(y h xl)\ien}. 

Prior to introduction of the new exact algorithm, we need to study the LTS-OF as the 
algorithm is based on some special properties of it. Having described these properties, we 
will first propose one-dimensional version of the algorithm which is easier to demonstrate, 
then the general case will be given. 

2 Objective function 

2.1 Discrete reformulation of LTS-OF 

For every f3 G W only h data with least squared residuals appear in Q. Every such h- 
element subset of all data V can be unambiguously determined by 0-1 vector w £ US™, where 
w' 1 = 1 if (yi,xf) is an element of this subset and w l = otherwise - in this sense we will 
be speaking about a subset w. For any element of the set of all such vectors 

Q ( - n ' h) = {w e K n | w l G {0, 1}, i g h, w 1 + . . . + w n = h}, (6) 
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we define two sets 

I w = {keh\w k = l}, (7) 

O w = {k eh\w k =0}. 

Clearly, for any f3 there exists at least one w € Q^ 1 ^ so that X^=i rf^iP) = SlLi w l r 2 ((3). 
Employing this fact we get: 

h n 

minVr^ (/3) = min YVrfOS) (8) 

2—1 Z— 1 

min ( min(WY -WX(3) T (WY -WXP)) (9) 
min \\WY- WX(X T WXy 1 X T WY)\\ 2 , (10) 

where W = diag(w). Having this equation, we can propose a new objective function of the 
LTS defined on Q( n >V 

J{w) = \\W(Y - X{X T WX)- 1 X T WY))\\ 2 . (11) 

It is straightforward that J(w) is the minimum of the OLS-OF for the subset w, i.e. 

we can also reformulate §5§ to the following form 

p(LTS,n,h) = (x T W* Xy 1 X T W*Y, (12) 

where 

w* = argmin J(w) and W* — diag(w*). (13) 
2.2 Domain of LTS-OF 

The discrete version of the LTS-OF proposed in the previous paragraph is well known and 
has already been described in many articles dealing with the LTS, especially with computing 
the LTS estimate. In the present paragraph we shall discuss the non-discrete LTS-OF, i.e. 
OF {LTS ^ h \f3), where PeW. Several of the features, we are going to propose, were already 
mentioned in [5]. We will reprove them and broaden them somewhat. 

Definition 1. We define a relation Z cW x Q^ h ) by 

h n 

((3,w) e z &j2 r Uv=Y, wir iW- 

i=l i—1 

Further, we define a set U C K p as the set where Z is a mapping from R p to Q^ n ' h ^ . 
Complement ofU to W is denoted by %. 

Assertion 1. For (3 G R p there exists only one w S Q( n > h > so that {(3,w) £ Z, i.e., (3 e U, 
if, and only if, r 2 h) ((3) < r 2 h+1) ((3). 

Indeed, if rf(/?) = rf h) (0) = r 2 h+1) (/3) = r 2 (/3) and (p,w) € Z then also (p,w) € Z 
where W is created from w by swopping the i-th. and J-th elements. 

Corollary 2. The following holds: 

% = {^W\r 2 h) (P)=r 2 h+1) {(3)}. 



3 



For every (3 E% there exist i,j € h such that r 2 h ^((3) — r 2 ((3) — r 2 ((3) — r 2 h+1 J(3), this 
equality is equivalent to J"i(/3) = ±r_y(/3) <^> =p % + (zf T ^J)/^ = 0. 

Assumption 1. Let us assume that for Problem^ that 

1. (Vi, j £ n, i ^ j)(xi j= ±Xj), 

2. (Vien)(||a;i|| ^0). 

If Assumption [l] is fulfilled, then =p yj + (xi =F Xj)/3 — is represents a hyperplane, 
i.e., a closed set having Lebesgue measure 0. Since W is a finite union of such sets, it is also 
closed and of Lebesgue measure 0. 

Assertion 3. If Assumption^is fulfilled, we get 

1. fi^CH) = 0, i.e. the Lebesgue measure ofH is zero, 

2. the set U is open. 

Assume that for two different (3± , /?2 eW 

{/3 g R p | = ft + t(#j - ft),* e [0, i]}nK = 0, 

i.e., the line between 0\ and 02 does not cross the set H, then on this line we must have 
r \h)(P) < r ( ! /i+i)(^) an d so = %(Jh)- In words, the space R p is "divided" by the set 

H into a finitcrl number to of open disjoint subsets of U. 

Definition 2. For Problem{l\ we define a sequence of to e N sets w( se «) = {t/i}^ suc/i 
that 

1. Ui is open and connected for all i = 1, . . . , m, 

2. Ui n Uj = 0, for all i, j, i ^ j, 

3. UVL 1 U i =U ) 

4- u^ 1 du i = n. 

We say that Ui, Uj G U^ seq " > are neighbours if i ^ j and dUi n dUj ^ 0. Further, we define 
a set W( TOm ) of m vectors from Q^ 1 ^ 

w ( mm ) = _^ Wm \ w . = Z (p^ where p e Ut,ie to}. 

The sequence ZY( seq ) is uniquely determined by conditions 1, 2 and 3, condition 4 is 
implied by condition 3. The elements of W^ mm ' are correctly defined due to the fact that 
Z(j3) = Z(f3) for all (3,(3 € Ui where i G fa arbitrary. 

2.2.1 One-dimensional example 

As the above introduced definitions and assertions are crucial for understanding all the 
results of the following paragraphs, we will demonstrate their meaning on an example. The 
simplest instance of Problem [T] is the case of p — 1 when the argument of the LTS-OF is a 
real number (3 € R 1 . 

Let us assume that Assumption [l] is fulfilled. Then all residuals rf = (yi — Xi(3) 2 , as 
well as an arbitrary sum of them, are sharply convex parabolas. Thus, for every subset 
w G Q {n ' h) the function OF i - OLS ^ WX ' WY \(3) = £™ =1 w l {y i -x l (3) 2 is also an sharply convex 
parabola and the value of the discrete function J(w) is a minimum of it. 

2 The finiteness of the number of the subsets will be proved later; we will prove that m < ( j) ™ 1 )2 p . 
3 By definition, an open set A is connected if it cannot be represented as the disjoint union of 
two or more nonempty open sets. 
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Example 1. Find the LTS- estimate of Problem^ for the following settings: 



• n — 9, p — 1, h — 5, 

• Y = (-0.90, -0.80, 33.32, -27.23, 12.63, -14.18, -3.79, -8.66, -16.45) T , 

• X = (1.39, -2.25, 6.10, -8.50, 8.26, -8.67, 10.87, 13.70, 13.05) T . 

As j3 is a scalar, it is easy to draw the graph of OF^ LTS - n,h \p) for Example [l] What 
we need is to know is how to determine the value OF^ LTS ' n ' h ^ (f3) for a given j3. It can be 
easily done by evaluating and ordering the squared residuals rf((3) for all i. Employing the 
definition of the relation Z, we can say that we need to find a subset w such that (ft, w) E Z 
- in other words, we need to find the parabola OF^ OLS ' WX ' WY \j3), corresponding to the 
subset w, such that OF^ OLS ^ wx - WY \f3) = OF^ LTS ' n ^ h \[3) for the given f3. The sector of 
the graph of OF( LTS > n > h \f3) for Example [l] containing all the local minima is depicted in 
Figure [T] 




1 ' L- 1 — 1 ■— 1 1 ' — 1 

-2 -1 n.O >:■■ 1 h 2 3 h,< 

P 

Figure 1: The bold line is the graph of the LTS-OF, the other parabolas (thin lines) 
are graphs of OLS-OF corresponding to various data subsets w G Q^ 9 ' 5 \ 

It is clear that for data X Assumption [l] is fulfilled. The second part tells us that all 
squared residuals have parabolas as a graph and the first part that for any two parabo- 
las the intersection of their graphs is a set of Lebesque measure (a point, in the case 
of p = 1). Using our notation, we can reformulate the last sentence in this way: the 
set H. is the set of f3 for which more than one parabola coincides with the graph of the 
LTS-OF or, equivalently, the set of f3 for which more than one subset w is in the re- 
lation Z with f3. Denote H = {hi, . . . , h m -i}, where m = jfU( scq \ For Example [I] 
H = {-8.16, -7.44, -6.99, -3.92, -0.18, 0.25, 1.21, 3.20, 3.84}. 

Regarding the set U, we know that U = R 1 \ H, thus the set U is a union of m open 
intervals U\, . . . , U m . It is obvious that the sequence 

U(seq) equalg tQ the 

sequence of these 

intervals, i.e. W( seq ) = {Ui}^ 1 . All the sets Ui and all the corresponding vectors Wi £ W^ min ^ 
are given in Table [T] 
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i 


Ui 


{j G 9 | w\ = 1} 


i 


Ui 


{j € 9 | w{ = 1} 


1 


(-oo,-8.16) 


1,2,3,5,6 


6 


(-0.18,0.25) 


1,2,5,7,8 


2 


(-8.16,-7.45) 


1,2,3,5,7 


7 


(0.25,1.21) 


1,2,5,6,7 


3 


(-7.45,-6.99) 


1,2,5,6,7 


8 


(1.21,3.20) 


1,2,4,5,6 


4 


(-6,99,-3.92) 


1,2,5,7,9 


9 


(3.20,3.84) 


1,2,3,4,6 


5 


(-3.92,-0.18) 


1,2,7,8,9 


10 


(3.84,+oo) 


1,2,3,4,5 



Table 1: The sets Ui and the corresponding Wi for Example [T} 



Note that in general we have 



i ^ j =fr ^ Wj . 



For our example data u>3 — wj. Note also that not all sets U must contain a local minimum. 
For us there are only 4 local minima: (1) in /3 = —0.77, value 71.96 (2) in j3 = 0.14, value 
242.42 (3) in /? = 0.70, value 246.87 and (4) in /? = 2.06, value 156.15. 



2.3 Local minima of LTS-OF 

Now we will try to append to hitherto shown features and proposed notation some others, 
which will be useful from the point of view of the minimization of the objective function 
OF (LTS.n,h)^y without doubt it would be very useful to know if there exists a global 
minimum or if there could be more than one local minimum. Taking into account simulta- 



neously the discrete form of the LTS-OF, ( 12 1 and ( 13 ) we have the proof of the existence 



of the global minimum and also the alternative formula for it. 

As for the local minima, we have to employ the notation and facts from the previous 
paragraphs. We know that the domain of the LTS-OF can be written as a union of the open 
sets £Y( SGq ) = {U}™, ! and the set of measure zero H. We also proved that for all i = 1, . . . , m 
and for all f3 G U we have OF^ LTS ' n ^{(3) = OF<> OLS < w * x > wiY )(j3)) where W i = diagK) 
and w l G tT( min ) ( see Definition |2|) . This fact has an important consequence. 



Definition 3. We say that a matrix X G R™ ,p , n > p, has h-full rank if a rank of the matrix 
WX is p for all w G Q^ h \ W = diag(iu). 

It is well known that the OLS estimate is unique if, and only if, the design matrix 
has rank p. As we compute the OLS estimate for /i-element subset, we need the previous 
definition. 

Assertion 4. The objective function of Problem[l\OF t - LTS ' n ' h ^ (J3) has a local minimum in 
Po G Ui G lA^ seqS> if, and only if the function OF^ OLS W x,w Y \fl)) has a local minimum in 
/3 G Ui,i G m. 

Moreover, if X has h-full rank, then 



OF iLTS ' n ' h) ((3) has a local minimum at (3 G U f3 
where W l = diag(wi). 



How strong is the assumption that X has h-iul\ rank? It depends on the values of 
parameters p and h which determine the dimensions of the sub-matrixes of A. If h 3> p 
(usually true), then the assumption is very weak. 

Assertion [4] tells us how to find local minima located in the open set U. What if a local 
minimum is in the set W7 In what follows, we will prove that even if a local minimum is in 
the set "H, it can be still found as the OLS estimate for some subset w G Q^ n ' h \ 
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Lemma 5. 1. Function f : R n -> R 1 , f £ is a strictly convex function if and only 
ifS/ 2 f(x) is a positive- definite matrix. 

2. A strictly convex function has maximally one strict minimum. 

This Lemma is a classical result of mathematical analysis. 

Lemma 6. For Problem[J\ and every subset w £ Q( n >"-) it holds that if the matrix WX has 
full rank (W = diag(w)), then the function OF^ OLS,wx,WY \j3) is strictly convex. 

The proof follows from from Lemma |] and from the fact that \/2q F (ols.wx,wy) ^ = 
X T WX, where X T WX is positive-definite. 

Lemma 7. Let functions fx, ■ ■ ■ , fk be continuous having unique strict minimum, fi : R p — ► 
R 1 , and let h(x) = mxn{fi(x), . . . , fk(x)} for all x £ MP. Define a set S = {x £ R p \ fi(x) = 
■ ■ ■ = fk(x)}. If h has a local minimum at xq £ S, then fi has the strict global minimum at 
Xg for all i = 1, . . . , k. 

Proof 

If h has a local minimum at xq £ S, then there exists a neighbourhood U( Xo ) of xq such 
that 

(Vx £ U( Xo) )(h(x) > h(x ) = fi(x ) ■■■ = fk(xo)). 

The same inequality is clearly true on J7( Xo ) for all fi(x) (note that h{x) < fi(x)), and hence, 
due to the fact that all fi{x) have only one strict minimum, xq has to be also the point of 
the global minima of the functions fi, i = 1, . . . , k. 

Q.E.D 

Now we can propose the following not-surprising but important theorem. 

Theorem 8. If the matrix X from Problem{]}has h-full rank, then for every local minimum 
at a point /3q of the objective function OF^ LTS ' n ' h ^ (fj) there exists a vector w £ W^" 11 ^ such 
that 

(3 = (X T WX)- 1 X T WY, 

where W — diag(iu). 
Proof 

If 0q ElA, the proof follows directly from Assertion |4j 

Let us assume that {$$£%. It means that there exist more than one subset being in 
relation Z with /3q. Let us denote these subsets by uii 1 , . . . , uii k , k > 2. Now employing the 
previous lemma - putting fj = 0F^ OljS ' Wi i X ' W ^ Y \l3) for all j - and Lemma ^ we can be 
sure that /?o is a point of global minima of all functions fj. From Definition [2] we also know 
that T~L = Ui e7 %dUi. Thus, taking into account that f3o £ H, there are at least two of the 
subsets Wi 1 , . . . , Wi k , k > 2 (corresponding to two neighbours from U^ ^ - see Definition [2J 
which are elements of V^( min ). 

Q.E.D 



3 Borders Scanning Algorithm — BSA 

In the present section we shall propose a new algorithm for solving Problem [T] A principle 
of the algorithm is quite simple. It is based on the fact that 

0F (LTS,n,h) {j3 s min 0F (OLS,WX,WY) {p) 

weQ(n ' h) . . (14) 

= min F(° LS < w * x > wtY \p), 

i—l,...m 
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where W — diag(iu),W* — diag(wj) and m and Wi G W^ minS> are defined in Defini- 
tion [2] This equation claims that to get complete knowledge of the complicated function 
LTS-OF, it is enough to evaluate only m sharply convex objective functions of the OLS 
estimate OF ( - OLS > WiX > WiY '> (P). Taking into account ([To}, §U§ and Theorem [8) we can 



reformulate ( 14 ) as 



min > rf^(B) = mill J(w) = min J(wA, 

i—1 

i.e. the LTS estimate for Problem [T]can be obtained by evaluating J{wi) for all i G rh. More 
or less, most of algorithms take advantage of this fact. The question is how to determine 
(all) subsets Wi G W^( mm ) most effectively. At first, we illustrate how the BSA does it in the 
one-dimensional case. 

3.1 One-dimensional case 

As written above, for Example [l] the set W^ 1 ^ consists of m = 10 elements and the set 
% contains 9 points h\, . . . ,h$. For each point hk,k G 9 there exist exactly two subsets 
Wk 1 and Wk 2 from W^ min ^ such that (/i^wt,) G Z and (hk,Wk 2 ) G Z. These two subsets 
correspond to two parabolas whose intersection has a coordinate hk and that can be easily 
determined for a given hk using the following algorithm (which works for arbitrary p). 

Program 1. How to find all w G Q( n > h ) such that (/3,w) G Z for a given (3 G K p : 

1. For all i G n evaluate squared residual rf(/3) and order them. Define ik G {1, ■ • ■ , n} 
by rf k (/?) = rf k) (/3) for all k = 1, . . . , n. 

2. U r fh)(^) ^ r (h+i)(@) then return the unique subset w such that (w l = 1 rf(/3) < 
V r fh)(@) ~ r (7i+i)(^)> ^ us su PP ose that 



r? 08) < ••• < r?C8) < r? +1 (0) = ••• = r« a h 03) = 

= < +1 W = • • • = r? +t 08) < r? I+w (/?)<■■•< < (/3). 

TTien return all the subsets wsuch that I w (see |7| ) contains I indices corresponding to 
i\,...,ii and arbitrary (h — I) -element subset of indices corresponding to ii+i, ■ ■ ■ ,ii+t- 
Hence, there are subsets in relation Z with (3. 

In general, if Assumption [l] is fulfilled in the case of p = 1, that for every Wi G W^ min - ) 
there exists at least one point h G T-L such that (wi,h) G Z. Taking this into account, we 
can state: if the set TA. = {hi, . . . , h m -\\ is known, all the subsets T / F^ min - ) can be obtained 
by performing Program [T] for each hk G H. 

Only remaining task is how to determine the set T-L. According to its definition, a point 
f3 is an element of T-L if and only if r^(f3) — r? h+1 J/3), i.e. this equality is sufficient and 
necessary condition. Since the condition contains ordered residuals, it can not be used 
directly - at first we need to find some candidates for which we will perform ordering of 
residuals. These candidates can be determined by the following necessary condition: if 
(3 G Ti, then there are two distinct indices i,j such that rf(f3) = r|(/3). Let us denote the 
set containing all f3 G K 1 satisfying this necessary condition by H, i.e. 

H = {l3em}\rl{P)=r](fi),i^ ] }, (15) 
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obviously #H < 2( = 2(") = n(n — 1) (note that the equation is quadratic, i.e., it 
has two solutions ^'_^ J and ^'^ J - we still assume that Assumption jlj is fulfilled, hence 

Now we already have all necessary for proposing a one-dimensional version of BSA. 

Program 2. BSA in the case of p = 1. 

Denote the elements of the set Q^ n ^ by {v\, . . . ,v^}. 

1. Set k = 1 and J m i„ = +oo. 

2. Denote the indices of two data points from subset Vk by i,j. 

Save solutions of equation rf(/3) = r|(/3) as j3\ and $2, i.e., /3i = ^'_^ J and f3% — 

Vi+Vj 

Xi+Xj 

3. Evaluate and order residuals rf(/3i), I — 1, . . . , n. 

^. 7/ >( ft) (/3i) = ffh+i)(A) fi-e-, A € 77J, find subsets w (1) ,w (2) e which are in 

relation Z with fi\ (use Program^. 

5. If J(w (1) ) < J min , put J mm = J(w^) and fi min = /3 X . 
If J(w (2) ) < J mm , ptit J mm = J(u;( 2 )) and (i mm = /3 2 . 

6. If Pi ^ P2, repeat last two steps for $2 (modify J m i n and /3 m i n accordingly). 

7. If k < (™) , put k = k + 1 and go back to step 2. 

8. Return f3 m i n as the LTS estimate for Problem^ 

This algorithm works is finite if the 77 contains only a finite number of points. Assump- 
tion [T] is a sufficient condition for this; it is not a necessary condition, but still it is very 
weak and easily verifiable. 



3.2 Multidimensional case 

In the case of p > 1, the situation is more complicated. The source of complication is the 
fact, that the set 77 contains infinitely many points. In order to resolve this problem, we 
need to find some finite subset of 77, let us denote it 77 p , having the following property: for 
every w G V^ (min) there exists f3 £ 77 p such that (j3, w) G Z. 

Analogously to the case of p = 1, we will be looking for candidates for being an element 
of 77 p in the set H, namely in some suitable finite subset H p for which 77 p C H p C H . In the 
case of p — 1, the equality of the type rf(f3) — r|(/3) can have at most two solutions, in the 
case of p > 1, this equality only "decrements" the dimension by 1, i.e., the dimension of its 
solution is p — 1. But we need the dimension to be zero, this can be reached by considering p 
"independent" equations of the type rf((3) = r 2 (/3), in other words, a system of p equations 
with p unknowns /3 , . . . , /3 P 

*£(/?) = rf 2 cs) 



(16) 



where ii, . . . , i p +i corresponds to one of (p+ 1) -element subsets of n = {1, . . . , n}, i.e., to one 
element of Q( n > p+1 K Unfortunately, as we will prove later on, the system ( p~6[ ) is equivalent 
to 2 P linear systems of p equations. If all these systems are regular, then original system 



( 16 ) can have up to TP solutions. Taking into account this number and the fact that there 
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are ( = #Q^™' P+1 ^ different systems of type |l6]), the set H p , which is to be defined as 
a set of solutions of all such systems, contains ( p " 1 )2 p points from Rp. 

In the next section, all the sets introduced above (sets H, H p and H p ) will be redefined 
precisely and their mentioned (and some others) properties will be proved. In particular, we 
will propose assumptions which allow us to prove that the set % p is a suitable set for BSA. 



3.3 Set H p 



The goal of this section is to find a set % p for which it holds that for every w G 

^(min) there 

exists at least one /3 G T-L p such that (/3, w) £ Z. We will define it as it was hinted above, 
therefor we will need to define the sets H and H p containing candidates for being elements 



of H and H p . 



The set H is to be defined as in (15 1. The quadratic equation of the type rf((3) 



Jj(/3), i,j & n is equivalent to two linear ones 

{ Xi +x,j) t I3 = y z + y 3 
{Xi-Xj) p = Vi-yj, 

each defining p — 1-dimensional hyperplane. 

Definition 4. Let us denote 

= {/3 G Rp | rl{p) = rf - {(3 G Rp \ Vi - xjf3 = ±( Vj - xffl} 
hW,-) = {j3eRp\y i -xJ/3 = (y j -xfl3)} = {PeRp\y i -y j = (xl-xf)l3} 
H^' +) = {/3emp\y i -xJj3 = -(y j -xjp)} = {p€Rp\yi + y j = (xj + xj)(3} 

for every i, j G n, i ^ j and 



H 



Obviously the set H from this definition is the same one as the definition given in ( |15[ ). 

It is apparent that % C H. We also know, that the sets Ui,i G rh from Definition [2j are 
separated by the set T-L. The hyperplanes H^' + ' and H^'~' divide Rp into closed convex 
sets, so-called polytops, let us denote them Pk, k G K, where K is some finite number. We 
get that \J ke ^-dPk = H D H = Ui^mdUi and it implies that for every Ui there exist convex 
sets Pfej , . . . Pfe, , I > 1 such that Ui — U^jP^. and also that m < K. 

It is well known fact, that a bounded polytop equals to a convex envelope of points 
which are intersections of the hyperplanes bordering the polytop. The smallest number of 
hyperplanes allowing their intersection to be a point (i.e. the set with dimension 0) equals 
the dimension of the space. For us the space is Rp and the dimension is p. We propose some 
more notation to express what this fact means for our particular case. 

Let o g {+, — } represent one of the arithmetical operations, either an addition or a 
subtraction, i.e. xoy = x + yifo = + and x o y = x — yifo = — . 

Let (3 G H be an intersection of q + 1, q > 1, sets of the type ifW'^) such that 
p G ff(' Ili2l± ' U ff(*3.ia.±) u • • • U ff(w 9 +i,±) where h, . . .,i q+ i G n are distinct. It means 
that (3 is a solution of the following system 

4(/?) = ri(/?) 
r?,G8)=r? +l C8). 
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Note that if rf ± (f3) — r^(f3) = i"? h+1 -.(/3) then, moreover, j3 G %■ This system of equations 
is equivalent to the following 2 q systems 

(x h oj x l2 ) T f3 = y it o l y l2 

: : (18) 

(x iq o qXlq+1 ) T j3 = y lq o q y tq+1 . 
where (o 1; . . . , o q ) is an arbitrary element of the set product x* =1 {+, — }. 



Definition 5. Let (3 G H be a solution of system (18) of q equations where (o 1 ,... ! o g ) g 



x? =1 {+,— }. Further, let us assume there exists no i q +2 € fi \ . . . , fl^ci o g+1 6 

{+,-} so that (x lq+1 o q+1 x lq+2 ) T (3 = y lq+1 o q y iq+2 (i.e. r? (/?) = ... = r? +1 (£) = rf q+2 ((3)). 
Then we define an order of (3 



Ord(/3) = number of linearly independent equations in (18). 

We afeo define a set of zero-dimensional intersections 

H p = {(3eH\ OrdCS) = p} 

and its subset T~L P of such (3 G i/ p /or which rf^/3) = . . . = r^ +i (f3) = rJ h ^(f3) = r? ft+1 ^(/3), 
where i 1; ...,ip +1 G n are indices from the system of p linearly independent equations of 



type (18) 



Now we have all the notation necessary for proposing the assertion which, despite being 
very simple and natural, will be used as the basis of BSA. At first, let us prove the following 
useful lemma - to be able to do it, we need this very weak assumption. 

Assumption 2. 

(V/3GMp)(r^(/3)>0), 

Q p(LTS,n.h) 0(LTS,n,h)\ > q 

Lemma 9. Let us assume that Assumptions^ and^ are fulfilled and let B C Kp be a set 
containing all solutions of the system of equations 

(x h oj x l2 ) T (3 = y h o x y l2 



(19) 



(xi. o q x lq+1 ) T /3 = y lq o q y lq+l: 



where i q +i G n are distinct, q > 1, o 1; . . . , o q G {+, — } and rf^/3) = • • • = rf (/3) = 

r \h)(fi) = r ^j+i)(^)- ^ ften / or a ^ j>k G q + l,j ^= k and o G { + , — } either 

^P€B)((x ij ox ih ff3 = y ij oy ik ) (20) 

or 

is trite. 



Moreover, there exist o ;i ,...,o/ g {+, — } sucft £aa£ system (19) is equivalent to the 
system 

{x H o h x l2 ) T (3 = y H o h y l2 

: : (21) 

fati % x lq+1 ) T (3 = y n o lq y iq+1 . 
This holds even if r? 7^ r? ft) (j8). 
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Proof 

Assumption [2] implies that for all s, t £ n, s ^ t the system 



(x s + x t ) T (3 = y s + y t 
(x s - x t ) T (i = y a - y t 



(22) 



has not any solution j3 £ M.p such that r^(j3) — r^(f3) = r?U(/8). Indeed, if /?o is a solution of 
both equations (22) and r 2 s (f3) — rf(/3) — r 2 h ^(/3), then r s (/?o) = rt(Po) an d simultaneously 



r s (/3o) = —r t (/3 ) and so r 2 h ^(/3 ) = 0, i.e. Assumption |2j is not fulfilled (if r^(Po) — then 
certainly (3 = p( LTS ^ h )). 

Now, let us assume, without loss to generality, that j < k holds for j, k. Then for any 

f3 £ B 

(x i:l Qj x l]+l ) T f3 = y h oj y l] + l 



(x i]+1 o j+1 Xij+2 ) /3 = y Z]+1 o j+1 y 



If j + 1 = k, then, according to Assumption [2] and the first paragraph of this proof, equation 



(20) holds for all /3 € B (when = 0^) or it doesn't hold for any of them (when o ^ 
Further, let k be greater than j + By adding together the equations, in the case of Oj = — , 
or by subtracting them, in the case of Oj = +, we get the equivalent system 



(x^ x ij+1 ) /3 = yi . oj y lj+1 
(xi j Xi j+2 ) T f3 = yy yi j+2 
(x ij+2 o J+2 x tj+3 ) T /3 = y h+2 o j+2 y ij+3 . 

We can repeat this step r-times, r < q, until j + r + 1 = k and finish the proof of the first 
part of the lemma by employing again assumption [2] for the system 

{Xi 3 Oj x lj + 1 ) T f3 = y h oj y lk 
(x t] o' k _ x x lk fl3 = Vi . o' k _ x y ik . 

The second part of the lemma can be proved by repeating the same steps for j '• = 1 and 
k = q + l. 

Q.E.D 

The first part of the lemma tells us that each equation of the type (xi j o Xi j+2 ) T (3 — yi 3 o 



yi k ,j,k £ g + l,o e {+, — } is either equivalent with system ( 19 ) or set of solutions of this 
equation is disjoint with the set B (due to Assumption [2]). 

Now we can proceed to proving the most important assertion of this section - we shall 
prove that T-L p is suitable (in a sense of the previous section) set for BSA. Is it true in general 
or we have to assume that the data from Problem [T] satisfies some condition? The answer 
is that we have to propose some assumption which is, however, quite weak. The reason is 
that it could happen that the set H p is empty. For example, in the case of p = 2, if all 
hyperplanes of the type if ( l >-?'°) are parallel, then there is no zero-dimensional intersection. 
It happens if and only if all vectors Xi from Problem Q] are parallel. Then all systems of 
equations of the type 

(x n o-l x l2 ) T (3 = y k o-l y l2 
(Xn o 2 x l3 ) T (3 = y n o 2 y l3 

are linearly dependent and so they have either no solution or the set of all solutions is one- 
dimensional hyperplane. To avoid such a situation we propose the following assumption. 



12 



Assumption 3. For all (o lf . . . , o n _ 1 ) <e x™ =1 {+,— } the matrix of the system of n — 1 
equations 

(xi o! x 2 ) T P = J/1 Ox J/2 
(xi o n _! a;„) T /3 = j/i o n _i j/„ 

has rank p. 

Obviously, it prevents the situation described above. This assumption has a consequence 
which will be crucial for the proof of the main assertion. Let us formulate it as a lemma. 

Lemma 10. Let us assume that data from Problem^ satisfies Assumption^ If p > 1, then 
for 

(VZ G {2, . . . ,p})(Vi u ...,»! G n)(V°i, . . .,<>,_! G {+, -}) 
it holds that if the system of I — 1 equations 

(x^ o x X i2 ) T /3 = J/jj oj y i2 

: : (23) 

(x^ o;.! x H ) T (3 = y i± o l _ 1 y H 

has rank I — 1, then there exist G n \ {i\, . . . , i{\ and o ( G {+, — } such that the system 
of I equations 

{x n o x x t2 ) T f3 = y il o 1 y t2 

o;_j 2Cj,) T /3 = j/ij oi-x y it 
(x ix o ; x ii+1 ) T /3 = y h oi y il+1 

has rank I. 



Moreover, if (3 G Mp is a solution of (23 1 smc/i i/iai = ri h -,{f3) = T f} l j r i^{P) then 



il+i can be selected so that there exists (3\ such that rf (j3i) = ■ ■■ = rf (/3i) = rf (/3i) 
r 2 (h) {Px) = rl h+1) {(3) 

Proof 

Let us denote the elements of the set of indices h \ {i\, . . . , i{\ by {ji+i, . . • , j n } and let 
us select signs oj, . . . , o n _ 1 arbitrarily. Then, due to Assumption |3j the system of equation 

(x n o x x t2 ) T f3 = y il o a y i2 



(x it oi_ 1 x ll ) T (3 = y il o l _ 1 y. H 
(x tl o ; x Jl+1 ) T (3 = y h oj y Jl+1 

(x h o n _ 1 x Jn ) T f3 = y il o n _! y jn 
has rank p. We also know that I — 1 vectors 
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are linearly independent. Then, due to Steinitz Theorem, there exist indices jj+ij . . . , j P +i 
and ki, . . . ,k p (in other words, there exist p — l rows of the matrix of the system above) such 
that p vectors 

((X h O x X i2 ),..., (X h 0;_! X k ), (x h o kl X jl+1 ), (X h o fe; + 1 Xjp )) 

form a base of the vector space WLp. Obviously, as the index can be then taken each 
index from {ji+i, ■ ■ ■ , j p +i}- 

The second part of the lemma is a consequence of the first one and the continuity of 
squared residuals. Let us denote the set of all solutions of the system ( p3| by B C Rp. We 
know that there exist /?, ft G B and such that ^(/S) = ■•• = (ft = r^(f3) and 
r fi(^i) = • • • = r 2 ; jC/Si). Than, due to continuity of squared residuals, there must exists 
ft G 5 and such that r? (ft) = ••• = r? +i (ft) = r 2 ft) (ft) = r 2 ft+1) (ft). Note that it 
could happen that ft = or ft = ft. 

Q.E.D 

Assertion 11. Let us assume that Assumptions [Ij and [3| are fulfilled. If for the set 

Ui G U( se<1 \ie m ZioMs i/iai <9C/ 4 7^ 0, t/iera 

(3/3 G Kp)(/9 G 9f/,- n Hp), 

i.e. there exist . . i p +i G n and o 1; . . . , o p £ {+, — } suc/i i/iai /? is £/ie onZj/ one solution 
of the system of p linearly independent equations 

(x h oj x l2 ) T ^ = y it o l y l2 

{ x i P °p x i p +i) ft = Vip °p Vip+n 

where moreover rf^fi) — r 2 ^(/3) = ^^ +1 )(/3) is true. 
Proof 

Within the proof we shall partially use a syntax of computer programming - g will be 
treated as a variable which is a parameter of a loop, hence g = g + 1 means incrementing q 
of 1. The instruction go to <s? means "go back to the line beginning with the sign V. Let 
us also suppose that p > 1, if p = 1, then the assertion is trivial. 

Put q = 1. As dUi 7^ 0, there are ft 1 ' G dUi, i\ G I Wi and i 2 G Wi (see ^ and for Wi 
see Definition^ so that rf h) (/3^) = rf h+1) (/3^) = r? (/3W) = r 2 2 (/?W), i.e. there exists o x 

such that /3 (1) G H^' i2 ' 0l \ i.e. /3 < - 1 ^ is a solution of equation 

(x h oj a; l2 ) T ^ = y 2l y l2 . 

According to the previous lemma, there exist 13 G h, o 2 G {+, — } and ft 2 ' G Rp such that 
{3^ is a solution of the system 

{x h oj ir l2 ) T /3 = y l2 
{xi, o 2 x l3 ) T f3 = y n o 2 y l3 , 

where moreover rf^ft' 2 )) = t" 2 ^^ 2 ^) = r ^ + i)(/3^ 2 ^)- 

V Put q = q + 1. If q = p, the proof is completed. If it is not, the indices ii,...,i q , 
signs o 1; . . . ,o q _ 1 satisfy the assumptions of the previous lemma. Hence, there is an index 
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iq+i, a sign o q and (3^ such that 

(x h o 1 x i2 ) T /3 (9) = y h o x y i2 



{x iq Oq X lq+l ff3 (q) = y 4g o q y lq+1 
and rf i (/?W)=rf h) (/3W)=r^ +1) (^)) 

Go to 9. Q.£.£> 

The assertion ensures us that H p is a suitable for BSA and so we can propose also the 
multidimensional version of BSA. At the end of this section let us propose the following 
corollary. 

Corollary 12. Let us assume that Assumptions [7J [J| and[3| are fulfilled. For the number 
m = fHA^ seq ^ (see Definition^ we have 

m<( U V- 
" \p+lj 

Proof 

Due to Assertion 11 we know that m < Then, the proof follows from the fact that 

U p CH P , where #H P < {^V. 

Q.E.D 



4 BSA— new exact algorithm 
4.1 Description of algorithm 

The multidimensional version of BSA is a straightforward generalization of Program[2] Note 
that if Assumption [T] is satisfied, then in the case of p = 1 it holds that T~L = H p . If p > 1 
then Hp C "H. In short, BSA can be described as follows: find all f3 £ H p , by ordering 
the residuals verify whether j3 is also element of Hp. If it is, find all w £ Q^ n ' h "> such 



corresponds to w* (see (13)). 



that ((3,w) £ Z and evaluate J(w) (see (111) for them. The minimal obtained value then 



Program 3. BSA - generic definition for all dimensions. 

Denote all the elements of the set Q^ n ' p+1 ' ) by {v±, . . . , v/ n \} and all the elements of the 



cartesian product x P =1 {+, — } by {°^, . . . , a' 2 ^}, where oW = (o^\ . . . , o p '), i = 1, . . . 2 P . 

1. Set k = 1 and J m i n = +oo. 

2. Ifk> ( p l t ) go to step 9. 

3. Denote the indices of data from subset Vk by ii, . . . , i p +i, hence i\, . . . , i p +i £ n are 
distinct and v% = ■■ • = V k P+1 = 1. Put 1 = 1. 

4- If I > 2 P , put k = k + 1 and go to step 2. 

5. If the system of equations 

(x n o\' x t2 ) P = y H o\' y i2 



(24) 



is regular, then denote its solution by /3q, if it is not, put 1 = 1 + 1 and go to step 4- 
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6. Evaluate and order residuals r 2 (ft ). 

7. J/r? (A)) = rf h) (Po) = rf h+1) (0 o ), find subsets wC 1 ),...,^ G w/mc/i are in 
relation Z with ft (use Program [7]). 

«S. For j = l,...,g evaluate J(w^~>). If J(w^) < J min , put J mm = J(w^) and w mm = 

9. Put 1 = 1 + 1 and go to step 4- 

10. Return ft = ft( OLS ' W ^" X ' W ^" Y \W min = diag(w rmn ) as the LTS estimate for Prob- 
lem^ 

Several of steps need to be commented on in details. Prior to that, let us prove that 
BSA always find the exact LTS estimate for Problem [T] 

Theorem 13. If data of Problem^ satisfy Assumptions [7J [J| and [3| then BSA returns the 
LTS estimate ft( LT S,n,h). 

Proof 

The proof follows straightforwardly from equation (14) and Assertion 11 which tells us 
that during BSA we evaluate J(w) for all w G W( mm ). 

Q.E.D 

Now, let us comment on the steps. 

steps 2 5 The goal of the algorithm is to find all points of the set H p and for every 
ft G Hp find all Ui,i G rh and the corresponding uii such that ft G dUi- In order to 
find all elements of H p , it is necessary to find all elements of H p and it requires to 
resolve all possible systems of equations of type (24), i.e. it is necessary to go through 
all ( p " 1 )2P possibilities. 

step 5 Due to assertion [TT] we know that "in most cases" we can omit non- regular systems 
without losing the assurance that we will find all the elements of H p (see the proof) . 
Non-regularity would become a problem only if Assumption [3] is disrupted, i.e. at least 
one matrix (n — 1 X p + 1) would not be regular. The assumptions will be discussed 
later on. 

step 7 In this step fto is surely an element of H p , to verify that fto G T-L p it is necessary 
to verify whether r? (ft ) = rf h) (ft ) = rf h+1) (ft ). If r? (ft ) ? rf h) (ft ) = rf h+1) {ft ), 
then either ft 1-L p or it will be found during the loop for another value of the 
parameter k. 

If we assume that the equality r^(/3o) = i"f k Jfto) has the same probability for all k E h 
and for all fto G H p , then the probability that the algorithm will go to step 8 from 
step 7 (i.e. that r?(A)) = r 2 {h) (ft ) = r\ h+l) {ft ) holds) is p/(n-p+ 1). 

step 7 and 8 How many w can be in relation Z with ft ? The number g depends on 
I G {0, 1, ... ,p — 2} for which rf i (ft t ) = r f h ^i)(ftt) and equals (j^J- The worst case is 
([p/2]) an< ^ ^ ne best one is p (see also Program jlj). 
To conclude the basic description of the algorithm, we will calculate the complexity of 

it. 

In order to compute the exact LTS estimate ft( LTS ' n ' h '> by BSA it is necessary to 
• successively select all (_"]) elements of the set Q( n ' p+1 \ 



( • 2 P times resolve system (24) of p equations, 
( p !y ■ 2 P times evaluate and order n residuals, 
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• ( p +i) • 2 p • ^z|+t ' ([p/2]) t ime s calculate the OLS estimate p(.OLS,wx,WY) _ 

Of course, the complexity further depends on numerical methods used for ordering of the 
residuals and solving the systems of equation (in step 4 and also during calculating the OLS 
estimate in step 6) but such a discussion is beyond the scope of this work. 



4.2 Assumptions — verification and disruption of them 

As written above, Assumption [I] is quite weak and moreover easily verifiable. Assumption [2] 
is still weaker and we can rely on it without any doubt. 

Concerning Assumption [3] the situation is a bit more complicated. To verify that all 
2n-i ma t r i xes have full rank is too exhausting. On the other hand, an assumption that 
(n — 1 X p) matrix has rank p is quite weak (for n great enough) and we can rely on this is 
fulfilled. 

However, if an intercept is considered, Assumption [3] is always disrupted for oj = • • • = 
o n _ 1 = — for the first column of the matrix contains only zeros and so the rank of the 



matrix is less or equal to p — 1. Thus, if an intercept is considered, Assertion 11 (namely 



Lemma 10) is not proved and we lose the certainity that BSA always finds the exact LTS 
estimate. To resolve this problem, Assumption [3] has to be reformulated to the following 
form. 



Assumption 4. For all (°i, • • • , ° n -i) € (x" = i{+, — }) \ {( — >•■•! — )} the matrix of the 
system of equations 

(xi o! x 2 ) T P = J/1 Ox J/2 



(x\ o n _! x n ) T f3 = yi o n _j y n 

has rank p. 

Assuming that this Assumption [4] is fulfilled instead of Assumption [3] we can reprove 



Lemma 10 for models where an intercept is considered as follows. 
Proof 

The only difference between the proofs is selecting of the signs o i ,...,o„_ 1 . In the 
original Lemma we can select them arbitrarily, here, if Assumption [4] is considered, we 
demand (3k £ {I, . . . , n — l}(o fc ^ — ). The rest of proof is completely the same. 

Q.E.D 



Conclusion 

BSA proved to be quick enough to be usable for reasonably large data. Of course probabilistic 
algorithms are faster and they have found the exact solution of Problem [l] as well in all 
cases the author tested. BSA algorithm has been implemented in MATLAB and in C++ 
(by Roman Kapl) and is available by email. 
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